-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP : get additional data from HerdingspikesSortingExtractor #3525
base: main
Are you sure you want to change the base?
Conversation
Thank you! I'll take a look asap. |
thanks for this. |
def get_unit_location( | ||
self, | ||
unit_id, | ||
segment_index=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need segment_index here, does the units chnage over segment ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't work with multi-segmented recordings so I don't have much perspective on this. Since get_unit_spike_train
is implemented at both the sorting extractor and sorting segment level I did the same here just in case.
This is what I was wondering about. The easiest solutions for now would be to return matching arrays per unit, but for the sake of reusability I guess the BaseSorting class would need to be able to handle arbitrary per-spike properties. In the case of HerdingSpikes, a lot of useful data is already computed or extracted : waveforms, locations and amplitudes are already included, PCA is also computed but not included in the output. All of this is then recomputed by the sorting analyzer. Then it would be up to individual sorting extractors to match the expected data structure. |
On the data HS extracts:
I feel peak channel and x/y locations could be put into a SortingAnayzer object, that's where they would normally be found. Would this be possible? |
I agree with Sam that the unit properties should be written as such. I would be curios about how to handle spike properties if you have them so tagging along here. |
If this is too much for now, I think the PR can stay as is and just expose unit locations as a property. Regarding individual spikes, being able to load arbitrary properties on top of spike times might require changes deep in the structure. A simpler solution might be for individual extractors to specify "pre-computed" extensions that are then loaded when the sorting analyzer is created. If the changes can be mostly contained at the extractor level, I think it could avoid some extra abstractions at the sorter level ? This might not be relevant for most use-cases but for instance, computing waveforms on BioCAM data takes about 30 minutes with 100 spikes per unit, and the binary weighs about 180 GB. Meanwhile, the data is already on the HDF5 file read by the extractor, with a total size of less than 200 MB (same dtype, roughly the same cutout times). |
It seems there already was some attempt to extract the additional info with
load_unit_info
a while ago.For now, this PR can extract the location of a unit with
get_unit_location
, as you would withget_unit_spike_train
.The unit locations are loaded by default, as I don't think the extra memory and computation costs should be very significant.
The rest of the data that HerdingSpikes provides is per spike :
This should be quite a bit more memory intensive to retrieve, and I'm not entirely sure of the use case for it. Nevertheless, I can see that it was considered as a possibility in the code that was already there.
Any feedback would be appreciated !